In audio signal processing, auditory masking occurs when the perception of one sound is affected by the presence of another sound.Gelfand, S.A. (2004) Hearing – An Introduction to Psychological and Physiological Acoustics 4th Ed. New York, Marcel Dekker
Auditory masking in the frequency domain is known as simultaneous masking, frequency masking or spectral masking. Auditory masking in the time domain is known as temporal masking or non-simultaneous masking.
Gelfand provides a basic example.Gelfand, S.A. (2004) Hearing – An Introduction to Psychological and Physiological Acoustics 4th Ed. New York, Marcel Dekker Let us say that for a given individual, the sound of a cat scratching a post in an otherwise quiet environment is first audible at a level of 10 dB SPL. However, in the presence of a masking noise (for example, a vacuum cleaner that is running simultaneously) that same individual cannot detect the sound of the cat scratching unless the level of the scratching sound is at least 26 dB SPL. We would say that the unmasked threshold for that individual for the target sound (i.e., the cat scratching) is 10 dB SPL, while the masked threshold is 26 dB SPL. The amount of masking is simply the difference between these two thresholds: 16 dB.
The amount of masking will vary depending on the characteristics of both the target signal and the masker, and will also be specific to an individual listener. While the person in the example above was able to detect the cat scratching at 26 dB SPL, another person may not be able to hear the cat scratching while the vacuum was on until the sound level of the cat scratching was increased to 30 dB SPL (thereby making the amount of masking for the second listener 20 dB).
The filters that distinguish one sound from another are called , listening channels or Critical band. Frequency resolution occurs on the basilar membrane due to the listener choosing a filter which is centered over the frequency they expect to hear, the signal frequency. A sharply tuned filter has good frequency resolution as it allows the center frequencies through but not other frequencies (Pickles 1982). Damage to the cochlea and the outer hair cells in the cochlea can impair the ability to tell sounds apart (Moore 1986). This explains why someone with a hearing loss due to cochlea damage would have more difficulty than a normal hearing person in distinguishing between different consonants in speech.Moore, B.C.J. (1995) Perceptual Consequences of Cochlear Damage, Oxford, Oxford University Press
Masking illustrates the limits of frequency selectivity. If a signal is masked by a masker with a different frequency to the signal, then the auditory system was unable to distinguish between the two frequencies. By experimenting with conditions where one sound can mask a previously heard signal, the frequency selectivity of the auditory system can be tested.Moore, B.C.J. (1998) Cochlear Hearing Loss, London, Whurr Publishers Ltd
Figure B shows along the Y axis the amount of masking. The greatest masking is when the masker and the signal are the same frequency and this decreases as the signal frequency moves further away from the masker frequency. This phenomenon is called on-frequency masking and occurs because the masker and signal are within the same auditory filter (Figure C). This means that the listener cannot distinguish between them and they are perceived as one sound with the quieter sound masked by the louder one (Figure D).
The amount the masker raises the threshold of the signal is much less in off-frequency masking, but it does have some masking effect because some of the masker overlaps into the auditory filter of the signal (Figure E)
-frequency masking requires the level of the masker to be greater in order to have a masking effect; this is shown in Figure F. This is because only a certain amount of the masker overlaps into the auditory filter of the signal and more masker is needed to cover the signal.
Figure B also shows that as the masker frequency increases, the masking patterns become increasingly compressed. This demonstrates that high frequency maskers are only effective over a narrow range of frequencies, close to the masker frequency. Low frequency maskers on the other hand are effective over a wide frequency range.
Harvey Fletcher carried out an experiment to discover how much of a band of noise contributes to the masking of a tone. In the experiment, a fixed tone signal had various bandwidths of noise centered on it. The masked threshold was recorded for each bandwidth. His research showed that there is a critical bandwidth of noise which causes the maximum masking effect and energy outside that band does not affect the masking. This can be explained by the auditory system having an auditory filter which is centered over the frequency of the tone. The bandwidth of the masker that is within this auditory filter effectively masks the tone but the masker outside of the filter has no effect (Figure G).
This is used in MP3 files to reduce the size of audio files. Parts of the signals which are outside the critical bandwidth are represented with reduced precision. The parts of the signals which are perceived by the listener are reproduced with higher fidelity.
Similar to simultaneous masking, temporal masking reveals the frequency analysis performed by the auditory system; forward masking thresholds for complex harmonic tones (e.g., a sawtooth probe with a fundamental frequency of 500 Hz) exhibit threshold peaks (i.e., high masking levels) for frequency bands centered on the first several harmonics. In fact, auditory bandwidths measured from forward masking thresholds are narrower and more accurate than those measured using simultaneous masking.
Temporal masking should not be confused with the ear's acoustic reflex, an involuntary response in the middle ear that is activated to protect the ear's delicate structures from loud sounds.
The last situation where masking occurs is called central masking. This refers to the case where a masker causes a threshold elevation. This can be in the absence of, or in addition to, another effect and is due to interactions within the central nervous system between the separate neural inputs obtained from the masker and the signal.
When a sinusoidal signal and a sinusoidal masker (tone) are presented simultaneously the envelope of the combined stimulus fluctuates in a regular pattern described as beats. The fluctuations occur at a rate defined by the difference between the frequencies of the two sounds. If the frequency difference is small then the sound is perceived as a periodic change in the loudness of a single tone. If the beats are fast then this can be described as a sensation of roughness. When there is a large frequency separation, the two components are heard as separate tones without roughness or beats. Beats can be a cue to the presence of a signal even when the signal itself is not audible. The influence of beats can be reduced by using a narrowband noise rather than a sinusoidal tone for either signal or masker.
Combination tones are products of a signal and a masker. This happens when the two sounds interact causing new sound, which can be more audible than the original signal. This is caused by the non linear distortion that happens in the ear. For example, the combination tone of two maskers can be a better masker than the two original maskers alone.
The sounds interact in many ways depending on the difference in frequency between the two sounds. The most important two are cubic difference tones and quadratic difference tones .
Cubic difference tones are calculated by the sum.
(F1 being the first frequency, F2 the second) These are audible most of the time and especially when the level of the original tone is low. Hence they have a greater effect on psychoacoustic tuning curves than quadratic difference tones.
Quadratic difference tones are the result of
F2 – F1
This happens at relatively high levels hence have a lesser effect on psychoacoustic tuning curves.
Combination tones can interact with primary tones resulting in secondary combination tones due to being like their original primary tones in nature, stimulus like. An example of this is
3F1 – 2F2
Secondary combination tones are again similar to the combination tones of the primary tone.
Auditory masking is exploited to perform data compression for sound signals (MP3).
|
|